microeco: An R package for data mining in microbial community ecology

Chi Liu, Minjie Yao

Aim

 R language and its packages ecosystem are wonderful tools for data analysis. In microbial community ecology field, many packages can be used for the data analysis, such as vegan[1], ape[2] and picante[3]. However, with the development of the high-throughput sequencing techniques, the increasing data amount and complexity make the data analysis a challenge. There have been some R packages created for the community data analysis in microbial ecology, such as phyloseq[4], microbiome (https://github.com/microbiome/microbiome), microbiomeSeq (http://www.github.com/umerijaz/microbiomeSeq), ampvis2 (https://madsalbertsen.github.io/ampvis2/reference/index.html) and MicrobiomeR(https://github.com/vallenderlab/MicrobiomeR). However, we lack a flexible, comprehensive and modularized R package to analyze and manage the data. We create the microeco R package for this goal (https://github.com/ChiLiubio/microeco).

Main Features

  • R6 Class; fast, flexible and modularized
  • Taxa abundance plotting
  • Venn diagram
  • Alpha diversity
  • Beta diversity
  • Differential abundance analysis
  • Environmental data analysis
  • Null model analysis
  • Network analysis
  • Functional analysis

R6 Class

All the classes in microeco package depend on the R6 class. R6 uses the encapsulated object-oriented programming paradigm, which means that R6 is a profoundly different OO system from S3 and S4 because it is built on encapsulated objects, rather than generic functions. If you are interested in the class feature, read more from ‘Advanced R’ book.

  • A generic is a regular function, so it lives in the global namespace. An R6 method belongs to an object so it lives in a local namespace. This influences how we think about naming. The methods belong to objects, not generics, and you call them like object$method().

  • R6’s reference semantics allow methods to simultaneously return a value and modify an object.

  • Every R6 object has an S3 class that reflects its hierarchy of R6 class.

The use of help documents in the microeco package may be a little different from other packages we often used. If you want to see one of help documents of functions, you should search the name of the class (not the name of the function) and click the link of the function.

Details

microtable class

 Many tools can be used for the bioinformatic analysis, such as QIIME[5], usearch(https://www.drive5.com/usearch/), mothur[6], and RDP(http://rdp.cme.msu.edu/). Although the format of results may be different from various tools, the main files can be classified into the following parts: (1), OTU or ASV table, the species-sample abundance table; (2), taxonomy table, the taxonomy assignments information table; (3), phylogenetic tree, if required. Generally, it is necessary to create a sample information table to store all the detailed sample information, including the environmental data and the missing values (NA). The funtions read_qiime_otu_table() and split_assignments() in package qiimer are appropriate for the parsing of raw QIIME OTU table file. We also provide a function to parse the QIIME2 results to generate microtable object (see README in https://github.com/ChiLiubio/microeco).

 We use microtable class to store all the required data for the following analysis. The 16S rRNA sequencing results in the example data of the package is used to show the main part of the tutorial. This dataset is the 16S rRNA gene Miseq sequencing results of wetland soils in China published by An et al.[7], who surveyed soil bacterial communities in Chinese inland wetlands (IW), coastal wetland (CW) and Tibet plateau wetlands (TW) using 16S rRNA gene amplicon sequencing method. These wetlands include both saline and non-saline samples. The sample information table have 4 columns: “SampleID”, “Group”, “Type” and “Saline”. The column “SampleID” is same with the rownames. The column “Group” represents the IW, CW and TW. The column “Type” represents the sampling region: northeastern region (NE), northwest region (NW), North China area (NC), middle-lower reaches of the Yangtze River (YML), southern coastal area (SC), upper reaches of the Yangtze River (YU), Qinghai-Tibet Plateau (QTP). The column “Saline” represents the saline soils and non-saline soils. In this dataset, the environmental factor table is separated from the sample information table. Another ITS sequencing dataset is also stored in the example data of the package[8].

[1] “data.frame”

  1 2 3 4 5
OTU_4272 1 0 1 1 0
OTU_236 1 4 0 2 35
OTU_399 9 2 2 4 4
OTU_1556 5 18 7 3 2
OTU_32 83 9 19 8 102

[1] “data.frame”

  Kingdom Phylum Class
OTU_4272 k__Bacteria p__Firmicutes c__Bacilli
OTU_236 k__Bacteria p__Chloroflexi c__
OTU_399 k__Bacteria p__Proteobacteria c__Betaproteobacteria
OTU_1556 k__Bacteria p__Acidobacteria c__Acidobacteria
OTU_32 k__Archaea p__Miscellaneous Crenarchaeotic Group c__

Sometimes, your taxonomic table may have some chaotic information, such NA, unidentified and unknown. These information can influence the following taxonomic abundance calculation. So it is necessary to clean this file using the following code. Another important part of this operation is to unify the taxonomic prefix, e.g. transforming D_1__ to p__ for phylum level.

[1] “data.frame”

  SampleID Group Type Saline
1 1 IW NE Non-saline soil
2 2 IW NE Non-saline soil
3 3 IW NE Non-saline soil
4 4 IW NE Non-saline soil
5 5 IW NE Non-saline soil

[1] “data.frame”

  Latitude Longitude Altitude Temperature Precipitation
1 52.96 122.6 432 -4.2 445
2 52.95 122.6 445 -4.3 449
3 52.95 122.6 430 -4.3 449
4 52.95 122.6 430 -4.3 449
5 52.95 122.6 429 -4.3 449

[1] “phylo”

Then, we create an object of microtable class. This operation is very similar with the package phyloseq[4], but microeco is more brief and simpler. The otu_table in the microtable class must be the species-sample format: rownames must be OTU names, colnames must be sample names. The required sample names must be same in rownames of sample_table and colnames of otu_table.

[1] “microtable” “R6”

microtable class: sample_table have 90 rows and 4 columns otu_table have 13628 rows and 90 columns tax_table have 13628 rows and 7 columns phylo_tree have 14096 tips

To make the species and sample information consistent across different files in the dataset object, we can use function tidy_dataset() to trim the dataset.

microtable class: sample_table have 90 rows and 4 columns otu_table have 13628 rows and 90 columns tax_table have 13628 rows and 7 columns phylo_tree have 13628 tips

Then, we remove OTUs which are not assigned in the Kingdom "k__Archaea" or "k__Bacteria".

microtable class: sample_table have 90 rows and 4 columns otu_table have 13628 rows and 90 columns tax_table have 13330 rows and 7 columns phylo_tree have 13628 tips

We also remove OTUs with the taxonomic assignments “mitochondria” or “chloroplast”.

microtable class: sample_table have 90 rows and 4 columns otu_table have 13628 rows and 90 columns tax_table have 13296 rows and 7 columns phylo_tree have 13628 tips

Then, to make the OTUs same in otu_table, tax_table and phylo_tree, we use tidy_dataset() again.

microtable class: sample_table have 90 rows and 4 columns otu_table have 13296 rows and 90 columns tax_table have 13296 rows and 7 columns phylo_tree have 13296 tips

Then we use sample_sums() to check the sequence numbers in each sample.

[1] 10316 37087

Sometimes, in order to reduce the effects of species number on the diversity measurements, we need to perform the resampling to make the sequence number equal for each sample. The function rarefy_samples can invoke the function tidy_dataset automatically before and after the rarefying.

[1] 10000 10000

Then, we calculate the taxa abundance at each taxonomic rank using cal_abund(). This function return a list called taxa_abund containing several data frame of the abundance information at each taxonomic rank. The list is stored in the microtable object automatically.

[1] “list”

If you want to save the taxa abundance file to a local place, use save_abund().

Then, we calculate the alpha diversity. The result is also stored in the object microtable automatically. As an example, we do not calculate phylogenetic diversity.

[1] “data.frame”

We also calculate the distance matrix of beta diversity using function cal_betadiv(). We provide four most frequently used indexes: Bray-curtis, Jaccard, weighted Unifrac and unweighted unifrac.

trans_abund class

 This class is used to transform taxonomic abundance data for plotting the taxa abundance with the ggplot2 package. We first use this class for the bar plot.

We remove the sample names in x axis and add the facet to show abundance according to groups.

Then alluvial plot is implemented in the plot_bar function.

The box plot is an excellent way to intuitionally show data distribution across groups.

Then we show the heatmap with the high abundant genera.

Then, we show the pie chart.

trans_venn class

The trans_venn class is used for venn analysis. To analyze the unique and shared OTUs of groups, we first merge samples according to the “Group” column of sample_table.

When the groups are too many to show with venn plot, we can use petal plot.

Sometimes, we are interested in the unique and shared species. In another words, the composition of the unique or shared species may account for the different and similar parts of ecological characteristics across groups[9]. For this goal, we first transform the results of venn plot to the traditional species-sample table, that is, another object of microtable class.

[1] “microtable” “R6”

We use bar plot to show the composition at the Genus level.

We also try to use pie chart to show the compositions at the Phylum level.

trans_alpha class

 Alpha diversity can be transformed and plotted using trans_alpha class. Creating trans_alpha object can return two data frame: alpha_data and alpha_stat.

Group Measure N Mean SD SE
CW Observed 30 1843 220.6 40.27
CW Chao1 30 2553 338.1 61.73
CW ACE 30 2716 367 67.01
CW Shannon 30 6.308 0.5355 0.09777
CW Simpson 30 0.9897 0.01305 0.002382

Then, we test the differences among groups using the KW rank sum test and anova with multiple comparisons.

Groups Measure Test method p.value Significance
IW vs CW Observed KW 0.0371 *
IW vs TW Observed KW 0.4553
CW vs TW Observed KW 0.3912
IW vs CW vs TW Observed KW 0.155
IW vs CW Chao1 KW 0.002689 **
  Observed Chao1 ACE Shannon Simpson InvSimpson Fisher Coverage
IW a a a a a a a b
TW a ab b a a a a a
CW a b b a a a a a

Now, let us plot the mean and se of alpha diversity for each group, and add the duncan.test (agricolae package) result.

We can also use the boxplot to show the paired comparisons directly.

trans_beta class

 The distance matrix of beta diversity can be transformed and plotted using trans_beta class. The analysis referred to the beta diversity in this class mainly include ordination, group distance, clustering and manova. We first show the ordination using PCoA.

[1] “list”

Then we plot and compare the group distances.

Clustering plot is also a frequently used method.

perMANOVA is often used in the differential test of distances among groups.

Permutation: free
  Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
Group 2 6.121 3.06 10.57 0.1955 0.001
Residuals 87 25.18 0.2895 NA 0.8045 NA
Total 89 31.3 NA NA 1 NA
Groups measure permutations R2 p.value Significance
IW vs CW bray 999 0.1595 0.001 ***
IW vs TW bray 999 0.147 0.001 ***
CW vs TW bray 999 0.1556 0.001 ***
Permutation: free
  Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
Group 2 6.121 3.06 12.01 0.1955 0.001
Type 3 3.783 1.261 4.949 0.1208 0.001
Residuals 84 21.4 0.2548 NA 0.6836 NA
Total 89 31.3 NA NA 1 NA

trans_diff class

 Differential abundance test is a very important part in the microbial community data analysis. It can be used to find the significant taxa in determining the community differences across groups. Currently, trans_diff class have three famous approaches to perform this analysis: metastat[10], LEfSe[11] and random forest. Metastat depends on the permutations and t-test and performs well on the sparse data. It is used for the comparisons between two groups.

LEfSe combines the non-parametric test and linear discriminant analysis. We implement this approach in this package instead of the python version.

Taxa Group pvalue LDA
k__Bacteria|p__Proteobacteria CW 3.21e-11 4.84
k__Bacteria|p__Acidobacteria|c__Acidobacteria IW 8.559e-13 4.79
k__Bacteria|p__Acidobacteria IW 5.749e-12 4.789
k__Bacteria|p__Bacteroidetes TW 1.19e-09 4.773
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria CW 5.475e-12 4.624

We can also plot the abundance of taxa detected using LEfSe.

Then, we show the cladegram of the differential features in the taxonomic tree. There are too many taxa in this dataset. As an example, we only use the highest 200 abundant taxa in the tree and 50 differential features. We only show the full taxonomic label at Phylum level and use letters at other levels to reduce the text overlap.

There may be a problem related with the taxonomic labels in the plot. When the levels used are too many, the taxonomic labels may have too much overlap. However, if we only indicate the Phylum labels, the taxa in the legend with marked letters are too many. At this time, you can select the taxa that you want to show in the plot manually like the following operation.

If you are interested in taxonomic tree, you can also use metacoder package[Foster_Metacoder_2017] to plot the taxonomic tree based on the selected taxa. We do not show the usage here.

The third approach is rf, which depends on the random forest[12, 13] and the non-parametric test. The current method can calculate random forest by bootstrapping like the method in LEfSe and only use the significant features. MeanDecreaseGini is selected as the indicator value in the analysis.

trans_env class

 The environmental variables are very useful in analyzing microbial community structure and assembly mechanisms. We first show the RDA analysis (db-RDA and RDA).

Mantel test can be used to check whether there is significant correlations between environmental variables and distance matrix.

variable_name cor_method corr_res p_res significance
Temperature pearson 0.452 0.001 ***
Precipitation pearson 0.2791 0.001 ***
TOC pearson 0.13 0.003 **
NH4 pearson -0.05539 0.922
NO3 pearson 0.06758 0.049 *
pH pearson 0.4085 0.001 ***
Conductivity pearson 0.2643 0.001 ***
TN pearson 0.1321 0.002 **

The correlations between environmental variables and taxa are important in analyzing and inferring the factors affecting community structure. In this example, we first perform the differential abundance test and random forest analysis to obtain the important genera. Then we use those taxa to perform correlation analysis.

Then, we can plot the correlation results using ggplot2 or pheatmap.

Sometimes, it is necessary to study the correlations between environmental variables and taxa for different groups.

If you are concerned with the relationship between environmental factors and alpha diversity, you can also use this function.

trans_nullmodel class

In recent decades, the integration of phylogenetic analysis and null model promotes the inference of niche and neutral influences on community assembly more powerfully by adding a phylogeny dimension [14, 15]. The trans_nullmodel class provides an encapsulation, including the calculation of the phylogenetic signal, beta mean pairwise phylogenetic distance (betaMPD), beta mean nearest taxon distance (betaMNTD), beta nearest taxon index (betaNTI), beta net relatedness index (betaNRI) and Bray-Curtis-based Raup-Crick (RCbray). The approach for phylogenetic signal analysis is based on the mantel correlogram [16], in which the change of phylogenetic signal is intuitional and clear compared to other approaches. The algorithms of betaMNTD and betaMPD have been optimized to be faster than those in the picante package [3]. The combinations between RCbray and betaNTI (or betaNRI) can be used to infer the strength of each ecological process dominating the community assembly under the specific hypothesis [15]. This can be achievable by the function cal_process() to parse the percentage of each inferred process. We first check the phylogenetic signal.

betaNRI(ses.betampd) is used to show the ‘basal’ phylogenetic turnover[16]. Compared to betaNTI, it can capture more turnover information associated with the deep phylogeny. It is noted that there are many null models with the development in the several decades. In the trans_nullmodel class, we randomized the phylogenetic relatedness of species. This shuffling approach fix the observed levels of species α-diversity and β-diversity to explore whether the observed phylogenetic turnover significantly differ from null model that phylogenetic relatedness among species are random.

If we want to plot the betaNRI, we can use plot_group_distance function in trans_beta class. For example, the results showed that the mean betaNRI of TW is extremely and significantly larger that those in CW and IW, revealing that the basal phylogenetic turnover in TW is high.

Sometimes, if you want to perform null model analysis for each group individually, such as one group as one species pool, you can calculate the results for each group, respectively. We can find that, when we perform betaNRI for each group respectively, mean betaNRI between CW and TW are not significantly different, and they are both significantly higher than that in IW, revealing that the strength of variable selection in CW and TW may be similar under the condition that each area is considered as a specific species pool.

BetaNTI(ses.betamntd) can be used to indicate the phylogenetic terminal turnover [15].

1 2 3 4 5
0 -5.637 -5.701 -5.758 -5.531
-5.637 0 -6.101 -6.275 -6.013
-5.701 -6.101 0 -6.333 -6.197
-5.758 -6.275 -6.333 0 -6.141
-5.531 -6.013 -6.197 -6.141 0

RCbray (Bray-Curtis-based Raup-Crick) can be calculated using function cal_rcbray() to assess whether the compositional turnover was governed primarily by drift [17]. We applied null model to simulate species distribution by randomly sampling individuals from each species pool with preserving species occurrence frequency and sample species richness [16].

As an example, we also calculate the proportion of the inferred processes on the community assembly as shown in the references [15, 16]. In the example, the fraction of pairwise comparisons with significant betaNTI values (|βNTI| > 2) is the estimated influence of Selection; βNTI > 2 represents the heterogeneous selection; βNTI < -2 represents the homogeneous selection. The value of RCbray characterizes the magnitude of deviation between observed Bray–Curtis and Bray–Curtis expected under the randomization; a value of |RCbray| > 0.95 was considered significant. The fraction of all pairwise comparisons with |βNTI| < 2 and RCbray > +0.95 was taken as the influence of Dispersal Limitation combined with Drift. The fraction of all pairwise comparisons with |βNTI| < 2 and RCbray < -0.95 was taken as an estimate for the influence of Homogenizing Dispersal. The fraction of all pairwise comparisons with |βNTI| < 2 and |RCbray| < 0.95 estimates the influence of Drift acting alone.

process percentage
variable selection 4.419
homogeneous selection 48.71
dispersal limitation 0
homogeneous dispersal 8.739
drift 38.13

trans_network class

 Network is a frequently used approach to study the co-occurrence patterns in microbial ecology[18–20]. In this part, we describe all the core contents in the trans_network class. The network construction approaches can be classified into two types: correlation-based and non correlation-based. Several approaches can be used to calculate correlations and significances.

We first introduce the correlation-based network. The parameter cal_cor in trans_network is used for selecting the correlation calculation method.

The parameter COR_cut can be used to select the correlation threshold. Furthermore, COR_optimization = TRUE represent using RMT theory to find the optimized correlation threshold instead of the COR_cut[18].

We plot the network and present the node colors according to the calculated modules in Gephi.

Now, we show the node colors with the Phylum information and the edges colors with the positive and negative correlations. All the data used has been stored in the network.gexf file, including modules classifications, Phylum information and edges classifications.

Property Value
Vertex 407
Edge 1989
Average_degree 9.774
Average_path_length 3.878
Network_diameter 9
Clustering_coefficient 0.4698
Density 0.02407
Heterogeneity 1.194
Centralization 0.09908
  z module p taxa_roles
OTU_50 -1.305 M2 0 Peripheral nodes
OTU_1 -0.04067 M2 0 Peripheral nodes
OTU_55 -1.239 M2 0 Peripheral nodes
OTU_13824 -0.2403 M2 0 Peripheral nodes
OTU_151 -1.372 M2 0.4444 Peripheral nodes

Now, we show the eigengene analysis of modules. The eigengene of a module, i.e. the first principal component of PCA, represents the main variance of the abundance in the species of the module.

Then we perform correlation heatmap to show the relationships between eigengenes and environmental factors.

The function cal_sum_links() is used to sum the links (edge) number from one taxa to another or in the same taxa. The function plot_sum_links() is used to show the result from the function cal_sum_links(). This is very useful to fast see how many nodes are connected between different taxa or within one taxa. In terms of “Phylum” level in the tutorial, the function cal_sum_links() sum the linkages number from one Phylum to another Phylum or the linkages in the same Phylum. So the numbers along the outside of the circular plot represent how many edges or linkages are related with the Phylum. For example, in terms of Proteobacteria, there are roughly total 900 edges associated with the OTUs in Proteobacteria, in which roughly 200 edges connect both OTUs in Proteobacteria and roughly 150 edges connect the OTUs from Proteobacteria with the OTUs from Chloroflexi.

The subset_network() function can be used to extract a part of nodes and edges among these nodes from the network. In this function, you should provide the nodes you need using the node parameter.

Then we show the next implemented network construction approach: SPIEC-EASI (SParse InversE Covariance Estimation for Ecological Association Inference) network in SpiecEasi R package [21].

We also introduce the third network construction approach: Probabilistic Graphical Models (PGM), which is implemented in julia package FlashWeave[22]. If you want to use this method like the following code, you should first install julia language in your computer and the FlashWeave package, and add the julia in the computer path (see FlashWeave part in https://github.com/ChiLiubio/microeco).

trans_func class

 Ecological researchers are usually interested in the the funtional profiles of microbial communities, because functional or metabolic data is powerful to explain the structure and dynamics of microbial communities and to infer the underlying mechanisms. As metagenomic sequencing is complicated and expensive, using amplicon sequencing data to predict functional profiles is a good choice. Several software are often used for this goal, such as PICRUSt[23], Tax4Fun[24] and FAPROTAX[25, 26]. These tools are great to be used for the prediction of functional profiles based on the prokaryotic communities from sequencing results. In addition, it is also important to obtain the functions for each taxa or OTU, not just the whole profile of communities. But it is hard to know exact functions of each OTU. FAPROTAX database is a collection of the traits and characteristics of prokaryotes based on the known research results published in books and literatures. We match the taxonomic information of prokaryotes against this database to identify the traits of prokaryotes on biogeochemical roles. We also implement the FUNGuild database[27] to identify the traits of fungi.

  methanotrophy acetoclastic_methanogenesis
OTU_4272 0 0
OTU_236 0 0
OTU_399 0 0
OTU_1556 0 0
OTU_32 0 0

The percentages of the OTUs having the same trait can reflect the functional redundancy of this function in the community or the module in the network.

methanotrophy acetoclastic_methanogenesis
0.39 0.04
0.27 0
0.48 0
0.48 0
0.56 0

Tax4Fun requires a strict input file demand on the taxonomic information. To analyze the trimmed or changed OTU data in R with Tax4Fun, we provide a link to the Tax4Fun functional prediction.

K00001; alcohol dehydrogenase [EC:1.1.1.1] K00002; alcohol dehydrogenase (NADP+) [EC:1.1.1.2]
0.0004823 5.942e-06
0.0005266 4.017e-06
0.0005054 6.168e-06
0.0005109 5.888e-06
0.0005083 5.547e-06

Now, we use pathway file to analyze the abundance of pathway.

microtable class: sample_table have 90 rows and 4 columns otu_table have 284 rows and 90 columns tax_table have 341 rows and 3 columns

Now, we need to trim data and calculate abundance.

microtable class: sample_table have 90 rows and 4 columns otu_table have 284 rows and 90 columns tax_table have 284 rows and 3 columns Taxa abundance: calculated for level_1,level_2,level_3

Then, we can plot the abundance.

We can also do something else. For example, we can use lefse to test the differences of the abundances and find the important enriched pathways across groups.

Now, we use the ITS amplicon sequencing dataset as an example to show the use of FUNGuild database[27].

Notes

add layers to plot

Most of the plots are generated by applying the ggplot2 package. The important parameters in the plotting functions are configured according to our experience. If the inner parameters can not meet the use, the user can add the layers to the plot like the following operation or make the plot using the data (generally data.frame class) stored in the object.

clone

R6 class has a special copy mechanism which is different from S3 and S4. If you want to copy an object completely, you should use function clone instead of direct assignment.

[1] FALSE

[1] TRUE

change object

All the classes are set public, meaning that you can change, add or remove the objects stored in them as you want.

References

1. Oksanen J, Blanchet FG, Friendly M, Kindt R, Legendre P, McGlinn D, et al. Vegan: Community ecology package. 2019.

2. Paradis E, Schliep K. Ape 5.0: An environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 2018; 35: 526–528.

3. Kembel SW, Cowan PD, Helmus MR, Cornwell WK, Morlon H, Ackerly DD, et al. Picante: R tools for integrating phylogenies and ecology. Bioinformatics 2010; 26: 1463–1464.

4. Mcmurdie PJ, Holmes S. Phyloseq: An r package for reproducible interactive analysis and graphics of microbiome census data. Plos One 2013; 8: e61217.

5. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, et al. QIIME allows analysis of high-throughput community sequencing data. Nature Methods 2010; 7: 335–336.

6. Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology 2009; 75: 7537–41.

7. An J, Liu C, Wang Q, Yao M, Rui J, Zhang S, et al. Soil bacterial community structure in chinese wetlands. Geoderma 2019; 337: 290–299.

8. Gao C, Montoya L, Xu L, Madera M, Hollingsworth J, Purdom E, et al. Strong succession in arbuscular mycorrhizal fungal communities. The ISME journal 2019; 13: 214.

9. Mendes R, Kruijt M, De BI, Dekkers E, Van der VM, Schneider JH, et al. Deciphering the rhizosphere microbiome for disease-suppressive bacteria. Science 2011; 332: 1097–100.

10. White J, Nagarajan N, Pop M. Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Computational Biology 2009; 5: e1000352.

11. Segata N, Izard J, Waldron L, Gevers D, Miropolsky L, Garrett WS, et al. Metagenomic biomarker discovery and explanation. Genome Biology 2011; 12: R60.

12. Beck D, Foster JA. Machine learning techniques accurately classify microbial communities by bacterial vaginosis characteristics. PloS one 2014; 9: e87830.

13. Yatsunenko T, Rey FE, Manary MJ, Trehan I, Dominguez-Bello MG, Contreras M, et al. Human gut microbiome viewed across age and geography. Nature 2012; 486: 222–227.

14. Webb CO, Ackerly DD, McPeek MA, Donoghue MJ. Phylogenies and community ecology. Annu Rev Ecol Syst 2002; 33: 475–505.

15. Stegen JC, Lin X, Fredrickson JK, Chen X, Kennedy DW, Murray CJ, et al. Quantifying community assembly processes and identifying features that impose them. The ISME Journal 2013; 7: 2069–2079.

16. Liu C, Yao M, Stegen JC, Rui J, Li J, Li X. Long-term nitrogen addition affects the phylogenetic turnover of soil microbial community responding to moisture pulse. Scientific Reports 2017; 7: 17492.

17. Chase JM, Kraft NJ, Smith KG, Vellend M, Inouye BD. Using null models to disentangle variation in community dissimilarity from variation in α-diversity. Ecosphere 2011; 2: art24.

18. Deng Y, Jiang Y-H, Yang Y, He Z, Luo F, Zhou J. Molecular ecological network analyses. BMC bioinformatics 2012; 13: 113.

19. Faust K, Raes J. Microbial interactions: From networks to models. Nature Reviews Microbiology 2012; 10: 538–550.

20. Coyte KZ, Schluter J, Foster KR. The ecology of the microbiome: Networks, competition, and stability. Science 2015; 350: 663–666.

21. Kurtz ZD, Muller CL, Miraldi ER, Littman DR, Blaser MJ, Bonneau RA. Sparse and compositionally robust inference of microbial ecological networks. PLoS Comput Biol 2015; 11: e1004226.

22. Tackmann J, Matias Rodrigues JF, Mering C von. Rapid inference of direct interactions in large-scale ecological networks from heterogeneous microbial sequencing data. Cell Systems 2019; 9: 286–296 e8.

23. Langille MG, Zaneveld J, Caporaso JG, McDonald D, Knights D, Reyes JA, et al. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nature Biotechnology 2013; 31: 814–21.

24. Aßhauer KP, Wemheuer B, Daniel R, Meinicke P. Tax4Fun: Predicting functional profiles from metagenomic 16S rRNA data. Bioinformatics 2015; 31: 2882–2884.

25. Louca S, Jacques SM, Pires AP, Leal JS, Srivastava DS, Parfrey LW, et al. High taxonomic variability despite stable functional structure across microbial communities. Nature Ecology & Evolution 2016; 1: 0015.

26. Louca S, Parfrey LW, Doebeli M. Decoupling function and taxonomy in the global ocean microbiome. Science 2016; 353: 1272.

27. Nguyen NH, Song Z, Bates ST, Branco S, Tedersoo L, Menke J, et al. FUNGuild: An open annotation tool for parsing fungal community datasets by ecological guild. Fungal Ecology 2016; 20: 241–248.